The Properties and Further Applications of Chinese Frequent Strings

نویسندگان

  • Yih-Jeng Lin
  • Ming-Shing Yu
چکیده

This paper reveals some important properties of CFSs and applications in Chinese natural language processing (NLP). We have previously proposed a method for extracting Chinese frequent strings that contain unknown words from a Chinese corpus [Lin and Yu 2001]. We found that CFSs contain many 4-character strings, 3-word strings, and longer n-grams. Such information can only be derived from an extremely large corpus using a traditional language model(LM). In contrast to using a traditional LM, we can achieve high precision and efficiency by using CFSs to solve Chinese toneless phoneme-to-character conversion and to correct Chinese spelling errors with a small training corpus. An accuracy rate of 92.86% was achieved for Chinese toneless phoneme-to-character conversion, and an accuracy rate of 87.32% was achieved for Chinese spelling error correction. We also attempted to assign syntactic categories to a CFS. The accuracy rate for assigning syntactic categories to the CFSs was 88.53% for outside testing when the syntactic categories of the highest level were used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Chinese Frequent Strings Without a Dictionary From a Chinese Corpus and its Applications

This paper describes how to extract Chinese frequent strings without using a dictionary. In this paper, we generalize the notations of words and unknown words to those of frequent strings. The Chinese frequent strings (CFSs) we define include words, unknown words, and other strings that are frequently used. Some examples of CFSs are “ (can only let)”, “ (every minute and every second)”, “ (bear...

متن کامل

Group Generalized Interval-valued Intuitionistic Fuzzy Soft Sets and Their Applications in\ Decision Making

Interval-valued intuitionistic fuzzy sets (IVIFSs) are widely used to handle uncertainty and imprecision in decision making. However, in more complicated environment, it is difficult to express the uncertain information by an IVIFS with considering the decision-making preference. Hence, this paper proposes a group generalized interval-valued intuitionistic fuzzy soft set (G-GIVIFSS) which conta...

متن کامل

Synthesis, In Vitro activity and Metabolic Properties of Quinocetone and Structurally Similar Compounds

To investigate the cytotoxicity mechanism of quinocetone from the perspective of chemical structure, quinocetone and other new quinoxaline-1,4-dioxide derivatives were synthesized, and evaluated for their activities, and analysed for the metabolic characteristics. Quinocetone and other new quinoxaline-1,4-dioxide derivatives were synthesized, and evaluated for their activities, and analysed for...

متن کامل

Synthesis, In Vitro activity and Metabolic Properties of Quinocetone and Structurally Similar Compounds

To investigate the cytotoxicity mechanism of quinocetone from the perspective of chemical structure, quinocetone and other new quinoxaline-1,4-dioxide derivatives were synthesized, and evaluated for their activities, and analysed for the metabolic characteristics. Quinocetone and other new quinoxaline-1,4-dioxide derivatives were synthesized, and evaluated for their activities, and analysed for...

متن کامل

Applications of TP2 Functions in Theory of Stochastic Orders: A Review of some Useful Results

In the literature on Statistical Reliability Theory and Stochastic Orders, several results based on theory of TP2/RR2 functions have been extensively used in establishing various properties. In this paper, we provide a review of some useful results in this direction and highlight connections between them.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2004